1 Introduction

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by a new type of coronavirus: severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The outbreak first started in Wuhan, China in December 2019. The first kown case of COVID-19 in the U.S. was confirmed on January 20, 2020, in a 35-year-old man who teturned to Washington State on January 15 after traveling to Wuhan. Starting around the end of Feburary, evidence emerge for community spread in the US.

We, as all of us, are indebted to the heros who fight COVID-19 across the whole world in different ways. For this data exploration, I am grateful to many data science groups who have collected detailed COVID-19 outbreak data, including the number of tests, confirmed cases, and deaths, across countries/regions, states/provnices (administrative division level 1, or admin1), and counties (admin2). Specifically, I used the data from these three resources:

2 JHU

Assume you have cloned the JHU Github repository on your local machine at ``../COVID-19’’.

2.1 time series data

The time series provide counts (e.g., confirmed cases, deaths) starting from Jan 22nd, 2020 for 253 locations. Currently there is no data of individual US state in these time series data files.

Here is the list of 10 records with the largest number of cases or deaths on the most recent date.

Next, I check for each country/region, what is the number of new cases/deaths? This data is important to understand what is the trend under different situations, e.g., population density, social distance policies etc. Here I checked the top 10 countries/regions with the highest number of deaths.

2.2 daily reports data

The raw data from Hopkins are in the format of daily reports with one file per day. More recent files (since March 22nd) inlcude information from individual states of US or individual counties, as shown in the following figure. So I turn to NY Times data for informatoin of individual states or counties.

3 NY Times

The data from NY Times are saved in two text files, one for state level information and the other one for county level information.

The currente date is

## [1] "2020-06-13"

3.1 state level data

First check the 30 states with the largest number of deaths.

##            date                state fips  cases deaths
## 5658 2020-06-13             New York   36 387402  30565
## 5656 2020-06-13           New Jersey   34 166605  12589
## 5647 2020-06-13        Massachusetts   25 105395   7576
## 5639 2020-06-13             Illinois   17 133117   6491
## 5665 2020-06-13         Pennsylvania   42  82988   6264
## 5648 2020-06-13             Michigan   26  66024   6017
## 5629 2020-06-13           California    6 150418   5059
## 5631 2020-06-13          Connecticut    9  44994   4186
## 5644 2020-06-13            Louisiana   22  46396   3004
## 5646 2020-06-13             Maryland   24  61935   2926
## 5634 2020-06-13              Florida   12  73544   2924
## 5662 2020-06-13                 Ohio   39  40848   2554
## 5640 2020-06-13              Indiana   18  40535   2413
## 5635 2020-06-13              Georgia   13  54178   2411
## 5671 2020-06-13                Texas   48  88120   1989
## 5630 2020-06-13             Colorado    8  29002   1598
## 5675 2020-06-13             Virginia   51  53869   1541
## 5649 2020-06-13            Minnesota   27  30203   1314
## 5676 2020-06-13           Washington   53  26920   1216
## 5627 2020-06-13              Arizona    4  34773   1190
## 5659 2020-06-13       North Carolina   37  42911   1135
## 5651 2020-06-13             Missouri   29  16327    890
## 5650 2020-06-13          Mississippi   28  19348    889
## 5667 2020-06-13         Rhode Island   44  15947    833
## 5625 2020-06-13              Alabama    1  24601    773
## 5678 2020-06-13            Wisconsin   55  22638    694
## 5641 2020-06-13                 Iowa   19  23792    651
## 5668 2020-06-13       South Carolina   45  17955    599
## 5643 2020-06-13             Kentucky   21  12605    527
## 5633 2020-06-13 District of Columbia   11   9709    511

For these 20 states, I check the number of new cases and the number of new deaths. Part of the reason for such checking is to identify whether there is any similarity on such patterns. For example, could you use the pattern seen from Italy to predict what happen in an individual state, and what are the similarities and differences across states.

Next I check the relation between the cumulative number of cases and deaths for these 10 states, starting on March

3.2 county level data

First check the 50 counties with the largest number of deaths.

##              date               county                state  fips  cases deaths
## 232498 2020-06-13        New York City             New York    NA 214242  21551
## 231311 2020-06-13                 Cook             Illinois 17031  84581   4173
## 230915 2020-06-13          Los Angeles           California  6037  72023   2890
## 232002 2020-06-13                Wayne             Michigan 26163  21711   2669
## 232497 2020-06-13               Nassau             New York 36059  41172   2668
## 232517 2020-06-13              Suffolk             New York 36103  40615   1996
## 231914 2020-06-13            Middlesex        Massachusetts 25017  23156   1748
## 232423 2020-06-13                Essex           New Jersey 34013  18336   1741
## 232418 2020-06-13               Bergen           New Jersey 34003  18805   1664
## 232525 2020-06-13          Westchester             New York 36119  34252   1535
## 232922 2020-06-13         Philadelphia         Pennsylvania 42101  24338   1502
## 231014 2020-06-13            Fairfield          Connecticut  9001  16277   1345
## 231015 2020-06-13             Hartford          Connecticut  9003  11189   1321
## 232425 2020-06-13               Hudson           New Jersey 34017  18717   1253
## 232436 2020-06-13                Union           New Jersey 34039  16337   1121
## 232428 2020-06-13            Middlesex           New Jersey 34023  16385   1074
## 231983 2020-06-13              Oakland             Michigan 26125  11298   1067
## 231018 2020-06-13            New Haven          Connecticut  9009  12021   1041
## 231910 2020-06-13                Essex        Massachusetts 25009  15573   1041
## 232432 2020-06-13              Passaic           New Jersey 34031  16612    997
## 231918 2020-06-13              Suffolk        Massachusetts 25025  19299    943
## 231970 2020-06-13               Macomb             Michigan 26099   7035    889
## 231916 2020-06-13              Norfolk        Massachusetts 25021   8860    882
## 231920 2020-06-13            Worcester        Massachusetts 25027  11961    863
## 231070 2020-06-13           Miami-Dade              Florida 12086  21632    822
## 232431 2020-06-13                Ocean           New Jersey 34029   9222    813
## 232917 2020-06-13           Montgomery         Pennsylvania 42091   7865    768
## 232030 2020-06-13             Hennepin            Minnesota 27053  10069    712
## 231446 2020-06-13               Marion              Indiana 18097  10736    693
## 231896 2020-06-13           Montgomery             Maryland 24031  13573    686
## 232429 2020-06-13             Monmouth           New Jersey 34025   8720    667
## 232894 2020-06-13             Delaware         Pennsylvania 42045   6894    664
## 231912 2020-06-13              Hampden        Massachusetts 25013   6460    637
## 232943 2020-06-13           Providence         Rhode Island 44007  11959    637
## 232430 2020-06-13               Morris           New Jersey 34027   6568    635
## 231897 2020-06-13      Prince George's             Maryland 24033  17745    634
## 231917 2020-06-13             Plymouth        Massachusetts 25023   8478    621
## 233570 2020-06-13                 King           Washington 53033   8702    593
## 232483 2020-06-13                 Erie             New York 36029   6753    573
## 230814 2020-06-13             Maricopa              Arizona  4013  17791    549
## 232880 2020-06-13                Bucks         Pennsylvania 42017   5419    542
## 232427 2020-06-13               Mercer           New Jersey 34021   7323    517
## 231834 2020-06-13              Orleans            Louisiana 22071   7374    516
## 231027 2020-06-13 District of Columbia District of Columbia 11001   9709    511
## 231908 2020-06-13              Bristol        Massachusetts 25005   7906    507
## 232270 2020-06-13            St. Louis             Missouri 29189   5506    500
## 232509 2020-06-13             Rockland             New York 36087  13411    466
## 231824 2020-06-13            Jefferson            Louisiana 22051   8339    465
## 232434 2020-06-13             Somerset           New Jersey 34035   4756    436
## 231317 2020-06-13               DuPage             Illinois 17043   8402    430

For these 50 counties, I check the number of new cases and the number of new deaths.

4 COVID Trackng

The positive rates of testing can be an indicator on how much the COVID-19 has spread. However, they can be much more noisy data since the negative testing resutls are often not reported and the tests are almost surely taken on a non-representative random sample of the population. The COVID traking project proides a grade per state: ``If you are calculating positive rates, it should only be with states that have an A grade. And be careful going back in time because almost all the states have changed their level of reporting at different times.’’ (https://covidtracking.com/about-tracker/). The data are also availalbe for both counties and states, here I only look at state level data.

The grades of the states may change over timea and I strongly recommend checking their webiste before puting serious interpretation on the following plot.

5 Session information

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] httr_1.4.1    ggpubr_0.2.5  magrittr_1.5  ggplot2_3.3.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.3       pillar_1.4.3     compiler_3.6.2   tools_3.6.2     
##  [5] digest_0.6.23    lattice_0.20-38  nlme_3.1-144     evaluate_0.14   
##  [9] lifecycle_0.2.0  tibble_3.0.1     gtable_0.3.0     mgcv_1.8-31     
## [13] pkgconfig_2.0.3  rlang_0.4.6      Matrix_1.2-18    yaml_2.2.1      
## [17] xfun_0.12        gridExtra_2.3    withr_2.1.2      stringr_1.4.0   
## [21] dplyr_0.8.4      knitr_1.28       vctrs_0.3.0      cowplot_1.0.0   
## [25] grid_3.6.2       tidyselect_1.0.0 glue_1.3.1       R6_2.4.1        
## [29] rmarkdown_2.1    purrr_0.3.3      farver_2.0.3     splines_3.6.2   
## [33] scales_1.1.0     ellipsis_0.3.0   htmltools_0.4.0  assertthat_0.2.1
## [37] colorspace_1.4-1 ggsignif_0.6.0   labeling_0.3     stringi_1.4.5   
## [41] munsell_0.5.0    crayon_1.3.4